Introduction to HPC

Overview

Manuel Holtgrewe

Berlin Institute of Health at Charité

Course Overview

  • Welcome to the course! 👋
  • Introduction to High-Performance Computing (HPC)
  • Focus on biomedical and medical research applications

🔬 ⌨️ 🧬

  • Duration: [duration of the course]
  • Instructor: Manuel Holtgrewe
  • Contact Information: manuel.holtgrewe@bih-charite.de

Course Objectives

  • Fundamentals of HPC …
  • … for biomedical research
  • Practical skills in
    • Linux command line
    • HPC job submission
    • Scientific programming
  • Parallel computing techniques and their applications

Participant Background 🤸

  • Briefly introduce yourself:
    • Name
    • Background (biomedical/computational, programming experience, etc.)
    • Expectations from the course

Prerequisites 🎓

You should have experience with…

  • … the Linux operating system 🐧
  • … the Bash shell (interactive, scripting) 🐚
  • … using the Secure Shell (SSH) 🛡️
  • … one programming language (ideally: Python 🐍)
  • … (some exposure to) scientific programming / machine learning

Also: you need an account on the BIH HPC 🔑

What is High-Performance Computing?

  • Attempt at a definition
  • Role of HPC in biomedical and medical research
  • Trade-Offs

Attempt at a Definition: HPC …

  • refers to advanced computing techniques & technologies to solve complex computational problems efficiently
  • involves leveraging parallel processing, large-scale data analysis, and specialized hardware
    • to achieve high computational performance
  • systems consist of multiple computing nodes connected through a high-speed network, working together
  • enables researchers to tackle computationally intensive tasks that would be infeasible or too time-consuming otherwise
  • finds applications in a wide range of fields, including scientific research, engineering, data analytics, and machine learning

HPC in Biomedical Research

  • … plays a crucial role by enabling tackling of computational challenges
  • … allows for analyzing large-scale genomics, proteomics, …, datasets
    • leading to insights into diseases and potential treatments
  • … facilitates simulations such as protein folding, molecule interactions, etc.
  • … enables the efficient training of large-scale statistical and machine learning models

Trade-Offs of HPC

Advantages

  • fast execution of complex computational tasks
  • process and analyze large data sets
  • fast and large storage systems
  • MORE POWER 🦾

Drawbacks

  • learning curve / entry barrier
  • usually shared with other users
  • expensive to buy/operate
  • high power usage/CO2 footprint (reference)
  • “why is my job killed/crashing/not running?” 😶‍🌫️

There is no free lunch!

::::

:::::

What is Your Take? 🤸

“Blitzlicht” 📸

  • answer one of the questions
  • do not repeat a previous answer

So far

  • have you benefited from advantages?
  • have you suffered from drawbacks?

From here on

  • what do you hope to gain from using HPC?
  • what risks do you see?

HPC Systems and Architecture

  • Compute nodes
  • Shared memory vs. distributed memory systems
  • Cluster architecture
  • Distributed file systems
  • Job schedulers and resource management

⚠️ “Warning”: just a quick and superficial overview ;-)

Compute Nodes (1)

“Same-same (as your laptop), but different.”

  • 2+ sockets with
    • many-cores CPUs
    • e.g., 2 x 24 x 2 = 96 threads
  • high memory (e.g., 200+ GB)
  • fast network interface card
    • “legacy”: 10GbE (x2)
    • modern: 25GbE (x2)
  • local disks
    • HDD or solid state SSD/NVME

Compute Nodes (2)

More differences from “consumer-grade” hardware:

  • error correcting memory (bit flips are real)
    • Google in 2009: 8% of DIMMs have 1+ 1bit errors/year, 0.2% of DIMMs have 1+ 2bit errors/year
  • stronger fans
  • redundant power control
  • redundant disks

You are not the admin

no root/admin access, no sudo

Shared vs. Distributed Memory

Shared Memory

  • in-core/multi-threading
graph BT
    sq1[thread 1] --> ci(memory address)
    sq2[thread 2] --> ci(memory address)
  • ➕ low overhead
  • ➕ easy to get started
  • ➖ implicit communication, easy to make errors
  • ➖ do you really understand your memory model?

Distributed Memory

  • out-of-core/message-passing
graph LR
    sq1[thread 1] -->|How are you?| sq2[thread 2]
    sq2[thread 2] -->|- fine, thank!| sq1[thread 1]
  • ➕ explicit communication, fewer wrong assumptions(?)
  • ➕ model scales up better for larger systems
  • ➖ harder to get started
  • ➖ more complex primitives

Your Experience? 🤸

“Blitzlicht” 📸

  • answer one of the questions
  • do not repeat a previous answer
  • have you used shared/distributed
    memory parallelism before?
  • what is your experience/hope?

Cluster Architecture

  • head nodes (login/transfer)
  • compute nodes
    • generic: cpu
    • specialized: high-mem/gpus
  • storage cluster with parallel file system
  • scheduler to orchestrate jobs
  • Network/Interconnect

Distributed File Systems (1)

“Same-same (as your laptop), but different.”

  • POSIX file system
    • laptop: ext4/XFS/btrfs/ZFS/…
    • distributed: CephFS, GPFS/SpectrumScale, BeeGFS, …
  • POSIX guarantees
    • sync() → visible everywhere
    • mkdir/open() → visible everywhere
    • files can be opened by multiple processes
  • … harder to enforce in a distributed (multi-node) setting

Distributed File Systems (2)

Best Practices / Do’s & Don’ts

  • use modest sized directories (<10k entries)
    • don’t create one file per gene (or similar)
    • create sub directories, e.g., abcdefab/cdef
  • don’t splurge in file count
    • don’t create one file per NGS read (or similar)
  • avoid recursive traversal of large structures
    • ls -lR will be slow!

Distributed File Systems (3)

Best Practices / Do’s & Don’ts

  • avoid small reads/writes and random access
    • each I/O operation (IOP) needs to go through the network
    • I/O systems are better at handling larger/sequential reads/writes
  • DO stream through your files
    • for each line/record in file: # do work
  • DO use Unix sort
  • DO use Unix pipelines rather than temporary files
    • e.g., seqtk mergepe R1.fq R2.fq | bwa mem ref.fa | samtools sort | samtools view -O out.bam

Job Scheduler and Resource Management

sequenceDiagram
    autonumber
    User-)+Scheduler: sbatch $resource_args jobscript.sh
    Scheduler->>+Scheduler: add job to queue
    User-)+Scheduler: squeue / scontrol show job
    Scheduler-->>+User: job status
    Note right of Scheduler: scheduler loop
    Scheduler-)Compute Node: start job
    Compute Node->>Compute Node: execute job
    Compute Node-)+Scheduler: job complete

Further Sessions

  1. Overview (this)
  2. Slurm Job Scheduler and Resource Manager
  3. Scientific Programming with Python
  4. Reproducible Workflows with Snakemake

This is not the end…

… but all for the first session

Recap

  • preqrequites
  • HPC in general and biomedical sciences
  • HPC hardware and cluster architecture
  • distributed file systems
  • a first peek at job schedulers
  • further sessions